[DRAFT] refactor query graph to make it more useful #140

guillesd · 2025-10-22T15:19:00Z

The idea of this PR is to make this tool a bit more useful (and visually appealing). I think generally it is good to have some better profiling tools to explore queries.

EDIT: Initially it was only about the query graph visuals, but I decided to expose the profiling functions enabled in the C++ API (similar to what the Go client does with the C API).

To use the tool:

Build this branch (see https://duckdb.org/docs/stable/dev/building/python)
Run the following in DuckDB

PRAGMA enable_profiling = 'json';
PRAGMA profiling_output = './tmp/profile.json';
SELECT ... FROM ...;

Run the script (I do it with uv):

uv run python -m duckdb.query_graph tmp/complex_profile.json

If you want to use the profiling within the client:

import duckdb

con = duckdb.connect()
con.enable_profiling()
con.execute("select 42")
profile_info = con.get_profiling_information()
# optionally
con.disable_profiling()

WIP: Now you can expose translate_json_to_html from duckdb.query_graph and do everything within the python process natively:

from duckdb.query_graph import translate_json_to_html #maybe some better naming here

translate_json_to_html(input_text = profile_info, output_file = 'tmp/profile.html')

Which should yield something like the following screenshot:

…iles

guillesd · 2025-10-23T18:30:55Z

The problem I'm facing now is that the profiling exposed by the API doesn't seem to have the float precision in CPU Time and overall Latency, and has some differences:

{
    "query_name": "",
    "total_bytes_written": 0,
    "latency": 0.0,
    "total_bytes_read": 0,
    "cumulative_rows_scanned": 0,
    "system_peak_buffer_memory": 0,
    "cumulative_cardinality": 0,
    "rows_returned": 0,
    "cpu_time": 0.0,
    "result_set_size": 0,
    "extra_info": {},
    "system_peak_temp_dir_size": 0,
    "blocked_thread_time": 0.0,
    "children": [
        {
            "operator_timing": 0.00003275,
            "operator_rows_scanned": 0,
            "operator_name": "PROJECTION",
            "operator_cardinality": 1,
            "cumulative_rows_scanned": 0,
            "cumulative_cardinality": 0,
            "cpu_time": 0.0,
            "result_set_size": 4,
            "operator_type": "PROJECTION",
            "extra_info": {
                "Projections": "42",
                "Estimated Cardinality": "1"
            },
            "children": [
                {
                    "operator_type": "DUMMY_SCAN",
                    "extra_info": {},
                    "cumulative_cardinality": 0,
                    "cumulative_rows_scanned": 0,
                    "operator_cardinality": 1,
                    "operator_rows_scanned": 0,
                    "operator_timing": 0.000005333,
                    "result_set_size": 4,
                    "cpu_time": 0.0,
                    "operator_name": "DUMMY_SCAN",
                    "children": []
                }
            ]
        }
    ]
}

Versus the same query using the PRAGMA approach within SQL:

{
    "total_bytes_written": 0,
    "total_bytes_read": 0,
    "rows_returned": 1,
    "latency": 0.004957334,
    "result_set_size": 4,
    "cumulative_rows_scanned": 0,
    "cpu_time": 0.000059875000000000005,
    "extra_info": {},
    "system_peak_buffer_memory": 0,
    "blocked_thread_time": 0.0,
    "cumulative_cardinality": 2,
    "system_peak_temp_dir_size": 0,
    "query_name": "select 42;",
    "children": [
        {
            "total_bytes_written": 0,
            "result_set_size": 4,
            "operator_timing": 0.000030667,
            "operator_rows_scanned": 0,
            "cumulative_rows_scanned": 0,
            "operator_cardinality": 1,
            "operator_type": "PROJECTION",
            "total_bytes_read": 0,
            "operator_name": "PROJECTION",
            "cpu_time": 0.000059875000000000005,
            "extra_info": {
                "Projections": "42",
                "Estimated Cardinality": "1"
            },
            "cumulative_cardinality": 2,
            "children": [
                {
                    "cumulative_cardinality": 1,
                    "extra_info": {},
                    "operator_name": "DUMMY_SCAN",
                    "cpu_time": 0.000029208,
                    "cumulative_rows_scanned": 0,
                    "operator_rows_scanned": 0,
                    "operator_timing": 0.000029208,
                    "result_set_size": 4,
                    "total_bytes_read": 0,
                    "operator_type": "DUMMY_SCAN",
                    "total_bytes_written": 0,
                    "operator_cardinality": 1,
                    "children": []
                }
            ]
        }
    ]
}

refactor query graph to make it more useful

0967d11

guillesd marked this pull request as draft October 22, 2025 15:19

guillesd added 2 commits October 23, 2025 19:19

expose profiling in the pyconnection

fc81a29

expose translate_json_to_html so that it can be used with object prof…

d1dcb98

…iles

guillesd requested a review from evertlammerts October 23, 2025 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT] refactor query graph to make it more useful #140

[DRAFT] refactor query graph to make it more useful #140

guillesd commented Oct 22, 2025 •

edited

Loading

Uh oh!

guillesd commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[DRAFT] refactor query graph to make it more useful #140

Are you sure you want to change the base?

[DRAFT] refactor query graph to make it more useful #140

Conversation

guillesd commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guillesd commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

guillesd commented Oct 22, 2025 •

edited

Loading